Search CORE

471 research outputs found

A fast and simple algorithm for constructing minimal acyclic deterministic finite automata

Author: Watson B.W.
Publication venue
Publication date: 01/01/2002
Field of study

In this paper, we present a fast and simple algorithm for constructing a minimal acyclic deterministic finite automaton from a denite set of words. Such automata are useful in a wide variety of applications, including computer virus detection, computational linguistics and computational genetics. There are several known algorithms that solve the same problem, though most of the alternative algorithms are considerably more difficult to present, understand and implement than the one given here. Preliminary benchmarking indicates that the algorithm presented here is competitive with the other known algorithms

A Boyer-Moore type algorithm for regular expression pattern matching

Author: Watson B.W.
Watson R.E.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/1994
Field of study

Repository TU/e

Pure OAI Repository

A new family and structure for Commentz-Walter-style multiple-keyword pattern matching algorithms

Author: Watson B.W.
Publication venue
Publication date: 01/01/2003
Field of study

In this paper, I present a new family of Commentz-Walter-style multiple-keyword string pattern matching algorithms. The algorithms share a common algorithmic skeleton, which is significantly optimized when compared to the original Commentz- Walter skeleton and subsequently derived improvements. The new skeleton is derived via correctness-preserving stepwise algorithmic improvements, in the Eindhoven style of programming

Repository TU/e

Pure OAI Repository

A taxonomy of sublinear multiple keyword pattern matching algorithms

Author: Watson B.W.
Zwaan G.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/1995
Field of study

AbstractThis article presents a taxonomy of sublinear keyword pattern matching algorithms related to the Boyer-Moore algorithm [3] and the Commentz-Walter algorithm [5, 6]. The taxonomy includes, amongst others, the multiple keyword generalization of the single keyword Boyer-Moore algorithm and an algorithm by Fan and Su [9, 10]. The corresponding precomputation algorithms are presented as well. The taxonomy is based on the idea of ordering algorithms according to their essential problem and algorithm details, and deriving all algorithms from a common starting point by successively adding these details in a correctness preserving way. This way of presentation not only provides a complete correctness argument of each algorithm, but also makes very clear what algorithms have in common (the details of their nearest common ancestor) and where they differ (the details added after their nearest common ancestor). Introduction of the notion of safe shift distances proves to be essential in the derivation and classification of the algorithms. Moreover, the article provides a common derivation for and a uniform presentation of the precomputation algorithms, not yet found in the literature

Repository TU/e

Elsevier - Publisher Connector

Pure OAI Repository

A taxonomy of keyword pattern matching algorithms

Author: Watson B.W.
Zwaan G.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/1992
Field of study

Repository TU/e

Pure OAI Repository

On compile time Knuth-Morris-Pratt precomputation

Author: Cleophas L.G.W.A.
Kourie J.
Watson B.W.
Publication venue: 'Czech Technical University in Prague - Central Library'
Publication date: 01/01/2011
Field of study

Many keyword pattern matching algorithms use precomputation subroutines to produce lookup tables, which in turn are used to improve performance during the search phase. If the keywords to be matched are known at compile time, the precomputation subroutines can be implemented to be evaluated at compile time versus at run time. This will provide a performance boost to run time operations. We have started an investigation into the use of metaprogramming techniques to implement such compile time evaluation, initially for the Knuth-Morris-Pratt (KMP) algorithm. We present an initial experimental comparison of the performance of the traditional KMP algorithm to that of an optimised version that uses compile time precomputation. During implementation and benchmarking, it was discovered that C++ is not well suited to metaprogramming when dealing with strings, while the related D language is. We therefore ported our implementation to the latter and performed the benchmarking with that version. We discuss the design of the benchmarks, the experience in implementing the benchmarks in C++ and D, and the results of the D benchmarks. The results show that under certain circumstances, the use of compile time precomputation may significantly improve performance of the KMP algorithm

Repository TU/e

Pure OAI Repository

Hardcoding and dynamic implementation of finite automata

Author: Kourie D.G.
Ngassam E.K.
Watson B.W.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2004
Field of study

The theoretical complexity of a string recognizer is linear to the length of the string being tested for acceptance. However, for some kind of strings the processing time largely depends on the number of states visited by the recognizer at run-time. Various experiments are conducted in order to compare the time efficiency of both hardcoded and table-driven algorithms when using such strings patterns. The results of the experiments are cross-compared in order to show the efficiency of the hardcoded algorithm over its table-driven counterpart. This help further the investigations on the problem of the dynamic implementation of finite automata. It is shown that we can rely on the history of the states previously visited in the dynamic framework in order to predict the suitable algorithm for acceptance testing

Repository TU/e

Pure OAI Repository

On minimizing deterministic tree automata

Author: Cleophas L.G.W.A.
Kourie D.G.
Strauss T.
Watson B.W.
Publication venue: 'Czech Technical University in Prague - Central Library'
Publication date: 01/01/2009
Field of study

We present two algorithms for minimizing deterministic frontier-to-root tree automata (dfrtas) and compare them with their string counterparts. The presentation is incremental, starting out from definitions of minimality of automata and state equivalence, in the style of earlier algorithm taxonomies by the authors. The first algorithm is the classical one, initially presented by Brainerd in the 1960s and presented (sometimes imprecisely) in standard texts on tree language theory ever since. The second algorithm is completely new. This algorithm, essentially representing the generalization to ranked trees of the string algorithm presented by Watson and Daciuk, incrementally minimizes a dfrta. As a result, intermediate results of the algorithm can be used to reduce the initial automaton’s size. This makes the algorithm useful in situations where running time is restricted (for example, in real-time applications). We also briefly sketch how a concurrent specification of the algorithm in CSP can be obtained from an existing specification for the dfa case

Pure OAI Repository

On minimizing deterministic tree automata

Author: Cleophas L.G.W.A.
Kourie D.G.
Strauss T.
Watson B.W.
Publication venue: 'Czech Technical University in Prague - Central Library'
Publication date: 01/01/2009
Field of study

Pure OAI Repository

Simultaneous interval regression for K-nearest neighbor

Author: A.L. Wilson
B.W. Silverman
D.D. Boos
E.A. Nadaraya
G.J. Lieberman
G.S. Watson
M. Kocherginsky
R. Koenker
R. Koenker
R.G. Miller
R.W. Mee
W.G. Howe
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

International audienceIn some regression problems, it may be more reasonable to predict intervals rather than precise values. We are interested in finding intervals which simultaneously for all input instances x ∈X contain a β proportion of the response values. We name this problem simultaneous interval regression. This is similar to simultaneous tolerance intervals for regression with a high confidence level γ ≈ 1 and several authors have already treated this problem for linear regression. Such intervals could be seen as a form of confidence envelop for the prediction variable given any value of predictor variables in their domain. Tolerance intervals and simultaneous tolerance intervals have not yet been treated for the K-nearest neighbor (KNN) regression method. The goal of this paper is to consider the simultaneous interval regression problem for KNN and this is done without the homoscedasticity assumption. In this scope, we propose a new interval regression method based on KNN which takes advantage of tolerance intervals in order to choose, for each instance, the value of the hyper-parameter K which will be a good trade-off between the precision and the uncertainty due to the limited sample size of the neighborhood around each instance. In the experiment part, our proposed interval construction method is compared with a more conventional interval approximation method on six benchmark regression data sets

CiteSeerX

Crossref

Scientific Publications of the University of Toulouse II Le Mirail